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The  need  for  high  precision  numerical  computing  is 
central  to  both  scientific  and  general  purpose  computation. 
In  achieving  this  high  precision,  conventional  computers 
have  employed  weighted  number  systems  with  a  fixed  radix 
(base) ,  e.g;  the  binary  number  system  (base  2) .  Advantages 
of  such  weighted  number  systems  include  the  ease  with  which 
magnitude  comparison,  sign  detection,  overflow  detection, 
digital-to-analog  conversion,  dynamic  range  extension,  and 
multiplication  or  division  by  a  power  of  the  base  can  be 
performed.  However,  for  arithmetic  operations  such  as 
addition,  subtraction  and  multiplication,  inherent 
propagation  of  carries  between  successive  digits  precludes 
truly  parallel  computation  in  a  weighted  number  system. 
Furthermore,  this  characteristic  imposes  a  fundamental 
limitation  on  the  speed  at  which  arithmetic  computation  can 
be  performed. 

Approaches  to  sidestep  the  speed  limitation  can 
generally  be  classified  into  two  main  categories.  First, 
one  can  "look  ahead"  and  calculate  the  carries,  reducing  the 
carry  propagation  time  at  the  cost  of  additional  circuitry. 
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The  second  approach  is  to  consider  alternate  number  systems 
for  data  representation  and  computation  which  have  unique 
characteristics  with  respect  to  carries.  We  will  follow  the 
latter  approach  and  consider  the  Residue  Number  System  (RNS) 
for  high-speed,  high-accuracy  numerical  computation. 

Another  variable  when  considering  numerical  computation 
is  the  choice  of  computing  media.  Optics  appears  as  a  good 
host  for  RNS-based  computation  due  to  the  fact  that  many 
features  of  the  RNS  couple  well  with  optical  processing. 
Natural  cyclic  phenomena  such  as  polarization  and  phase  of 
light  beams  are  candidates  for  residue  representation  which 
is  also  cyclic.  Both  residue  and  optics  also  present  the 
capability  to  perform  parallel  carry-free  computation.  The 
RNS  also  exhibits  the  feature  of  dividing  high-accuracy 
computation  into  several  independent  medium-accuracy 
"modules."  Linear  optical  systems  implementing  global 
interconnects  in  conjunction  with  fast  simple  optical  and 
hybrid  nonlinear  devices  represents  another  viable  approach 
to  performing  RNS  computation. 

As  early  as  1932,  optics  and  RNS  have  been  united  to 
perform  numerical  computation.  However,  only  in  the  past  15 
years  have  the  mutual  properties  been  used  to  gain  some 
computational  advantage.  The  most  general  form  of  RNS 
optical  processing  is  realized  by  look-up  tables  (LUTs) . 
Recently  two  groups,  Westinghouse  and  Boeing  Aerospace,  have 
proposed  position  coded  (PCR)  LUT  -  based  processing  systems 
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which  provide  a  reduction  in  the  complexity  associated  with 
LUT  processing.  The  complexity  savings  result  from 
exploiting  key  features  of  remainder  arithmetic  found  in  LUT 
processing. 

Under  this  task,  we  present  an  analysis  of  the  optical 
residue  look-up  table  processors.  The  initial  stage  of  the 
study  is  an  investigation  of  the  unique  features  of  RNS 
arithmetic  which  become  visible  when  realized  in  LUT 
architectures,  namely  constant  cross-diagonal  elements  of 
the  addition  LUT  and  the  zero  row  and  column  of  the 
multiplication  LUT  resulting  from  multiplication  by  zero. 
These  features  are  expressed  in  terms  of  cyclic  properties 
of  RNS  arithmetic  and  traced  to  their  foundation  in  Group 
Theory.  Addition  (modulo  m)  is  inherently  a  group 
operation,  and  multiplication  (modulo  m)  can  be  transformed 
into  a  group  operation  (modulo  m-1) ,  both  of  which  possess 
the  advantages  of  group  operation  LUT  processing. 

These  insights  are  subsequently  used  to  study  the 
particular  Westinghouse  and  Boeing  approaches  to  RNS  LUT 
processing.  Considering  the  RNS  modular  representation  as 
factoring  the  system  dynamic  range,  the  Westinghouse  group 
has  proposed  a  "second-level  factorization"  of  the  moduli, 
further  reducing  the  system  complexity.  The  above  mentioned 
RNS  multiplication  properties  allow  factoring  (m-1)  (for 
each  modulus  m)  into  p  factors  (k^,...,kp).  Processing  is 
performed  in  p  independent  (k^  X  k^)  LUTs,  leading  to  a 
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significant  reduction  in  the  total  number  of  LUT  entries, 
however,  addition  proves  to  be  more  complex  in  the  factored 
domain.  The  Boeing  LUT  processor  is  based  on  the  cross¬ 
diagonal  symmetry  present  in  group  operation  tables.  For 
LUT  processing,  one  only  needs  to  locate  the  appropriate 
cross-diagonal  of  the  table  for  the  calculation  of  the 
output,  effectively  reducing  the  number  of  table  locations. 
As  mentioned,  the  addition  LUT  possesses  this  symmetry  and 
the  multiplication  LUT  can  be  transformed  into  a  symmetric 
group  table. 

With  the  proper  RNS  and  LUT  arithmetic  processing 
background,  the  next  seep  is  performance  analysis.  We 
present  a  comparative  study  of  the  currently-proposed 
optical  LUT  *processors  with  respect  to  traditional  LUT 
processors  from  an  architectural  standpoint.  The  approach 
is  to  specify  the  LUT  processors  in  terms  of  fundamental 
arithmetic  processing  units  (APUs) ,  namely  multipliers, 
adders,  and  multiplier-accumulators  (MAUs) .  The  idea  is 
that  carefully  specifying  the  APUs,  and  maintaining  that 
input  and  output  data  formats  must  be  compatible,  will 
provide  common  ground  which  egualizes  the  processors  and 
reveals  all  costs.  In  an  effort  to  decouple  the 
architectures  from  hardware  specifics,  " computational 
components"  are  chosen  as  the  fundamental  blocks  from  which 
the  APU  architectures  will  be  constructed.  These  components 
include  basic  LUTs,  transforms,  encoders,  decoders,  and 
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specialized  hardware  such  as  zero-detectors.  Along  with  the 
APUs,  we  propose  a  set  of  performance  criteria  on  which  the 
architectures  are  to  be  evaluated.  In  an  effort  to  span  the 
range  of  performance  issues  and  their  trade-offs,  four  high- 
level  criterion  are  chosen.  They  are: 

(1)  temporal  complexity  -  the  number  of  sequential 
stages  required  to  perform  a  desired  operation, 

(2)  spatial  complexity  -  the  number  of  active  decision¬ 
making  elements  in  the  system, 

(3)  interconnect  complexity  -  measured  as  a  function  of 
the  average  "fan-out"  per  input  channel  and  "fan-in"  to  each 
output  channel.  The  interconnects  can  also  be  classified 
according  to  uniformity  from  channel  to  channel  as  "shift 
variant"  or  "shift  invariant," 

(4)  element  complexity  -  the  average  number  of 
resolvable  levels  required  of  the  active  elements,  i.e.  the 
element  dynamic  range. 

Notice  that  the  performance  criteria  can  all  be  related  to 
specific  cost  issues  at  the  hardware  level. 

The  performance  analysis  proceeds  as  follows.  The 
fundamental  components  are  specified  in  block  form  and 
analyzed  on  the  basis  of  our  listed  criteria.  The  component 
complexities  are  listed  in  a  table  for  easy  access  and 
comparison.  Multiplier,  adder  and  MAU  block  diagrams  are 
then  built  from  these  components  for  each  specific  approach. 
Evaluation  at  the  APU  level  consists  of  simply  adding  the 


-  5  - 


complexities  of  each  of  the  components.  The  results  of  APU 
complexity  for  each  approach,  Westinghouse,  Boeing  and 
direct,  are  compiled  in  a  table.  However,  spatial 
complexity  shows  a  strong  dependence  upon  moduli,  which  can 
best  be  seen  in  graphical  format,  and  therefore,  is  plotted. 
Conclusions  regarding  the  relative  performance  of  the  LUT 
architectures  as  a  function  of  modulus,  as  well  as  general 
conclusions,  can  be  readily  extracted  from  the  resulting 
tables  and  graphs. 

The  next  dimension  in  the  LUT  performance  analysis  is  a 
discussion  of  hardware  issues.  Overall  system  performance 
parameters,  such  as  throughput,  power  consumption, 
connectivity,  and  stability  are  determined  by  combining  the 
architectural  characteristics  with  hardware  characteristics. 
From  the  previous  analysis,  required  hardware  can  be  divided 
into  two  categories,  interconnects  and  active  switching 
elements.  The  architectural  analysis  provides  the  mapping 
from  algorithms  to  devices,  that  is,  the  hardware  selection 
is  guided  by  the  architecture  for  each  particular  approach 
to  LUT  processing.  In  this  section,  we  identify  the 
requirements  for  interconnects  and  switching  elements  as 
dictated  by  the  processing  components.  Next,  we  list  the 
various  optical  technologies  capable  of  performing  the 
required  connection  or  switch,  along  with  the  respective 
advantages  and  disadvantages.  It  becomes  apparent  that 
there  is  a  level  of  interaction  and  associated  trade-offs 
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between  architectures  and  hardware.  In  this  light,  we  see 
that  the  architectures  help  specify  which  technology  is  most 
amenable  to  that  type  of  LUT  processing.  Alternately,  one 
might  also  conclude  that  no  device  technology  meets  the 
requirements  of  a  particular  architecture. 

In  summary,  we  adopt  a  systematic  and  comprehensive 
approach  to  investigating  optical  RNS  processing.  We 
identify  the  algorithms  and  architectures  of  three 
approaches  to  LUT  processing  on  the  common  ground  of 
arithmetic  processing  units.  Performance  criteria  are 
chosen  such  that  cost  trade-offs  are  made  explicit  and  not 
postponed.  The  architectures  are  then  mapped  onto  devices 
and  technologies  suited  to  the  particular  approach.  It  is 
felt  that  only  in  this  domain  can  one  fully  assess  a 
reduction  in  complexity  of  one  approach  over  another.  Such 
a  systematic  approach  has  helped  us  in  identifying  the 
complexities  of  the  two  leading  approaches  (Boeing  and 
Westinghouse)  for  computationally  useful  operations  and 
furthermore  helped  us  locate  the  origins.  This 
understanding  cannot  be  obtained  by  a  totally  integrated 
performance  analysis  that  gives  a  parts  count,  throughput 
and  power  consumption  estimation  for  a  specific  hardware 
implementation  of  a  specific  architecture  which  is  based 
upon  a  specific  algorithm. 

The  report  will  proceed  as  follows.  The  first  section 
is  a  tutorial  on  the  basics  of  residue  arithmetic,  including 
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various  advantages  and  disadvantages  of  the  number  system. 
The  next  section  provides  some  background  by  tracing  the 
history  of  RNS-based  optical  processing.  Following  the 
introductory  material,  we  will  focus  our  sights  down  to 
optical  LUT  processing,  and  will  investigate  key  features  of 
the  RNS  which  make  the  more  recent  approaches  attractive. 
The  fourth  section  presents  the  detailed  architectural 
performance  analysis  of  optical  RNS  LUT  processing  based  on 
fundamental  performance  criteria  and  fundamental 
computational  units.  The  next  section  presents  the  results 
of  the  performance  analysis,  along  with  conclusions  based 
upon  the  results.  In  the  last  section,  we  will  map  the 
architectural  results  onto  hardware  considerations,  rounding 
out  the  analysis.  The  report  concludes  with  a  comprehensive 
general  reference  section. 
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I-  INTRODUCTION  TO  RNSi  F*  TUTORIAL. 

II-  RIMS  I  IM  ORT  I  CAL  PROCESSING 

III-  ORTICRL-  RIMS  LOOK-UP  TABLE 

PROCESS  I  IMS 

IS/-  PERFORMANCE  ANALVS I S  OF  LOOK-UP 

TABLE  PROCE3S  I  IMS 

V/-  RESULTS 

V/I.  HARDWARE  CONSIDERATIONS 
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WHY 


IN  TRADITIONAL  WEIGHTED  NUMBER 
SYSTEMS  <  B  I  N AR Y  ,  DEC  I  MAL  ,  .  .  )  ,  INHERENT 
PROPAGATION  OF  CARRIES  PRECLUDES 

TRULY  PARALLEL  COMPUTATION. 

>  THIS  POSES  A  FUNDAMENTAL  LIMITATION 
ON  THE  SPEED  AT  WHICH  ARITHMETIC 
OPERATIONS  CAN  BE  PERFORMED. 

TWO  APPROACHES  TO  CIRCUMVENT  THE 

LIMITATI ON  a 

I.  ADDITIONAL  CIRCUITRY  TO 
LOOK-AHEAD 

II.  ALTERNATE  NUMBER  SYSTEMS  WITH 
SPECIAL  CARRY  CHARACTERISTICS 
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INTRODUCTION  TO  THE 
IRES  I  DUE  NUMBER  SYSTEM  ■ 

I 

A  TUTORIAL 


I 
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the  residue  number  system 

*  SELECT  N  PAIRWISE  RELATIVELY  PRIME  INTEGER  MODULI 

m  x  ,  m  3  AS  SYSTEM  BASE 

*  INTEGER  X  IS  REPRESENTED  AS  AN  N-TUPLE  OF  RESIDUES 

X  ■  ^  (  R  l  1  R  a  ,  •  m  a  ,  R  N  ) 

whara  R  ±  »  t h ■  RESIDUE  o-f  X  modulo  m* 

-  | X ) «* 

and  0<R*  <m4  -  1  -for  aach  R* 

*  THE  REPRESENTATION  FOR  X  IS  UNIQUE  IN  THE  DOMAIN 

0  <  X  <^M  —  1  ,  whara  M  »  .  .*mN 

,.g.  TO  PERFORM  A  16-bit  MULTIPLY  REQUIRES  32-bit  DYNAMIC 
RANGE  ->  fit*  «  3,7,9,11,13*16,17,19,23 

*  EXAMPLEl 

BASE  MODULI  a  m*  >2,3,3 

I  NTESERSi  X  *  >7  X»  -  4 

REPRESENTATION!  X  *>>(1,1,2)  Xa>XS,l,4) 

ADDITIVE  INVERSE  -X*  -  |  m  *  —  R  *  |  m  t 

—  X  *  >>(1,2,3)  -Xa«XB,2,l) 

DYNAMIC  RANGE l  M  -  2*3*3  -  3B 
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THE  RESIDUE  NUMBER  SYSTEM 

*  DECODING  THE  RESIDUE  REPRESENTATION 

THE  CHINESE  REMAINDER  THEOREM* 

-  PERFORMS  RESIDUE  TO  ANALOG  CONVERSION 

-  GIVEN  THE  RESIDUE  REPRESENTATION  <  R  »  ,R,,.  .  .  ,RN) 
THE  CRT  DETERMINES  |  X  [  M 

PROBLEM!  REQUIRES  AN  ANALOG  SYSTEM  WITH  FULL  DYNAMIC 
RANGE  (M  not  m  *  > 
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TH 


I  DCIE  NUMBER  SYSTEM 


MIXED-RADIX  CONVERSION* 

CONVERSION  FROM  THE  RESIDUE  REPRESENTATION  TO  A  WEIGHTED 
NUMBER  SYSTEM 

-  REQUIRES  OPERATIONS  MODULO  m*  ONLY 

-  REQUIRES  N-l  SEQUENTIAL  STASES 


a*  a*. 


x 


x 


*  m  *  m 


♦  ^  fit  x  ^  fit 


RES I DU 


AR I TMMET  X G 


ADDITION)  PERFORM  MODULO  m*  ADDITION  OF 
CORRESPONDING  RESIDUES  FOR  EACH  MODULUS 

E  X  AMPLE l  X  i  > X  a  7  ■  >  (l,lf2) 

+  4  ■>  (B. 1,4) 

(1,2,1)  <-  11 

SUBTRACTION!  FIND  THE  ADDITIVE  INVERSE  OF  THE 
SUBTRAHEND  AND  THEN  PERFORM  ADDITION 

EXAMPLE*  X*-X„  7  •>  (1,1,2) 

-  X  »+(-Xa)  •»  (  —4 )  ■  >  (0.2.1) 

(1,0,3)  <-  3 

MULTIPLICATION*  MULTIPLY  CORRESPOND  I NB  RESIDUES 
AND  FINO  THE  RESIDUE  OF  THE  PRODUCT  MODULO  m  * 

EXAMPLE!  X*»Xai  7  ->  (1,1,2) 

x  4  ■>  (0.1.4) 

(0,1,3)  <-  28 

DIVISION!  GENERALLY  NOT  POSSIBLE 


FEATURES  OF  THE  RES I DUE  NUMBER 

SYSTEM 

»  ABILITY  TO  DECOMPOSE  A  CALCULATION  INTO  SUBCALCULATIONS 
OF  REDUCED  COMPUTATIONAL  COMPLEXITY 

->  SUBCALCULATION  ACCURACY  REQUIREMENT  COMMENSURATE  WITH 
PARTICULAR  MODULUS 

»  SUBCALCULATIONS  ARE  INHERENTLY  INDEPENDENT  AND  ARE 
PERFORMED  IN  SEPARATE  UNITS 

«>  PARALLEL  CARRY-FREE  ADDITION,  SUBTRACTION,  AND 
MULTIPLICATION 

*  LARGE  DYNAMIC  RANSC 

-  EXPANDABLE  BY  INCLUSION  OF  ADDITIONAL  MODULI 

•  RNS  IS  A  CYCLIC  NUMBER  SYSTEM! 

-  INTERMEDIATE  COMPUTATION  RESULTS  CAN  OVERFLOW  SYSTEM 
DYNAMIC  RANGE  WITHOUT  ERROR  IN  RESULT 

EXAMPLE!  (7*3) -9  -  33-9  -  26 

(1,1,2)  •  (1,2,8)  -  (1,2,8) 

(1,2,8)  -  (1,8,4)  -  (1,2,8)  *  (1,8,1)  -  (8,2,1) 
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PROBLEMS  WITH  THE  RESIDUE  NUMBER 

SYSTEM 

SOME  OPERATIONS  IN  THE  RNS  ARE  SLOWED  BY 
THE  NECESSITY  TO  CONVERT  TO  A  WEIGHTED 
REPRESENTATION  WHICH  IS  INHERENT  LY  — SEQUENT  I AL 

*  RELATIVE  MAGNITUDE  COMPARISON 

*  ALGEBRAIC  SIGN  DETECTION 

*  DYNAMIC  RANGE  OVERFLOW  DETECTION 

*  DIVISION 


ICS  AND  RNSi 

l  I  CAL.  PER8PECTIV 
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MOT  I  VF*T  I  OIM 


MANY  RNS  FEATURES  COUPLE  WELL  WITH  OPTICS 


*  REDUCED  DYNAMIC  RANGE  REQUIREMENTS  QP  PROCESSING  UNITS 
>2  ,  but:  <<1  000 


*  PARALLEL,  CARRY-PREE  COMPUTATION 


•  CYCLIC  NATURE  OP  RESIDUE  REPRESENTATION  COUPLES  WITH 
NATURAL  CYCLIC  PHENOMENA  POUND  IN  LIGHT  BEAMS 
(pol«riz«tion,  p has*,  diffraction) 


#  CONVENIENT  FOR  LOOK-UP  TABLE  PROCESSXN0 

-  FAN-OUT  /  FAN-IN  CAPABILITIES 

-  3-D  INTERCONNECTS 
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ANCIENT  OPTICAL  SYSTEM 

*  PHOTO-ELECTRIC  NUMBER  SIEVE,  D.H.  LEMER,  1932 

-  FOR  HIGH  DYNAMIC  RANGE  COMPUTING  BEFORE  EMERGENCE 
OF  THE  ELECTRONIC  COMPUTER 

-  EMPLOYS  LIGHT  BEAM  AND  PHOTO-CELLS  TO  SENSE  OPENINGS 
AND  CLOSINGS  OF  MECHANICAL  SIEVE 

-  30  GEARS,  ONE  FOR  EACH  PRIME  TO  113 

-  USED  TO  FACTOR  NUMBERS,  SUCH  AS  THE  MERSENNE  NUMBER 

-  SIFTED  20,000,000  NUMBERS  PER  HOUR 

ANALOG  REPRESENTATION 
»  A.  HUANG,  1973 

-  FIRST  TO  SUGGEST  COUPLING  PROPERTIES  OF  RN3  WITH 
OPTICAL  NUMERIC  COMPUTING 

*  S.  C0LLIN8,  1977 

-  PR0CE88QR  BA8ED  ON  RESIDUE  REPRESENTATION  BY 
POLARIZATION  AND  PHASE  STATES  OF  OPTICAL  BEAM 

PROBLEMi  REQUIRED  RESOLUTION  OF  OPTICAL  COMPONENTS 
■for  m  »37  ■>  raiol  ut  i  on>2  7T /37  radians 
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historical  PERSPECT I VE 

BINARY-CODED  RESIDUE  (BCR)  BASED  SYSTEMS 

-  RESIDUES  ARE  REPRESENTED  IN  THEIR  BINARY  FORM 

-  REQUIRES  f  I  oga  (m*  )  ]  BITS  PER  MODULUS 

•  CONTENT-ADDRESSABLE  MEMORY  ,  C.  QUEST  AND  T.  GAYLORD,  1900 

-  BASED  ON  TRUTH-TABLE  LOOK-UP  PROCESBINB 

-  CAM  STORES  THE  CANONICAL  SUM-OF -PRO DUCTS  REPRESENTATION 
OF  EACH  OUTPUT  BIT 

-  EMPLOY  LOGICAL  MINIMIZATION  TO  REDUCE  THE  NUMBER  OF 
REFERENCE  PATTERNS  THE  CAM  MUST  STORE 
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HISTORICAL  PERSPECTIVE 
POSITION-CODED  RESIDUE  <PCR>  BASED  SYSTEMS 

-  RESIDUES  ARE  REPRESENTED  BY  THE  SPATIAL  POSITION  OF 
LIGHT 

«  THE  OPTICAL  WHEEL,  F.  HORR I  BAN  AND  W.  STONER,  1979 

-  PROMPTED  BY  CYCLIC  NATURE  OF  KALEIDOSCOPE  OPTICS 

-  "OPTICAL  PISTON"  DEMONSTRATED  MAPPIN8  INPUT  POSITIONS 
INTO  CYCLIC  OUTPUT  POSITIONS 

•  CORRELATION  APPROACH,  D.  PSALTIS  AND  D.  CASASENT,  1979 

-  LINEAR  SYSTEM,  CORRELATION-BASED  FORMULATION  OF  RESIDUE 
ARITHMETIC 

-  EMPLOY  POSITION  CODING,  CARRIER  MODULATION  AND  APERTURE 
CONTROL  TO  ACHIEVE  RN8  OPERATIONS 

#  MAPPING  APPROACH,  A.  HUANG,  ET  AL.  1979 

-  IMPLEMENT  RESIDUE  ARITHMETIC  OPERATIONS  WITH  MAPS  WHICH 
PERFORM  PERMUTATION  OF  THE  INPUT  DATA 

-  MAP  BANKS  ARE  U8ED  TO  REALIZE  CHANGEABLE  MAPS 

-  EMPLOYS  CHANGEABLE  MAPS  FOR  CYCLIC  PERMUTATIONS 
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RIMS  LOOK-UP 


X  IMG 


nr  me  basics 

PCR  ENCODING  IN  “ONE  OUT  OF  m “  CQNF I SURAT  I  ON 


0  12  3  4 


*  TABLE  LOOK-UP  PROCESSING! 

MODULO  m  LUT»  REQUIRE  ma  TABLE  ENTRIES 


EXAMPLESl 

MODULO 

S 

ADDITION 

AND  MULTIPLICATION 

OF 

2  AND 

3 

-f- 

0 

1 

2 

3 

4 

X 

0 

1 

2 

3 

4 

0 

0 

1 

2 

3 

4 

0 

0 

0 

0 

0 

0 

1 

1 

2 

3 

4 

0 

1 

0 

1 

2 

3 

4 

2 

2 

3 

4 

0 

1 

2 

0 

2 

4 

1 

3 

3 

3 

4 

0 

1 

2 

3 

0 

3 

1 

4 

2 

4  i 

4 

0 

1 

2 

3 

4 

0 

4 

3 

2 

1 

ANSWER  -  0  ANSWER  -  1 


PROCESSING  IS  REALIZED  IN  MANY  WAYS* 

-  INPUT  POSITIONS  PROVIDE  ROW  AND  COLUMN  ADDRESS  FOR  LUT 

-  FOR  A  GIVEN  INPUT  WORD,  SECOND  INPUT  SELECTS  MAP  WHICH 
CORRECTLY  PERMUTES  INPUTS  FOR  GIVEN  OPERATION 
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RNS  LOOK-UP  TF*BI_E  PROCESSING 

KEY  FEATURES 

ADDITION  MODULO  3  MULTIPLICATION  MODULO  5 

->-31234  x  0  1  2  3  4 

a  a  i  2  3  4  aaaaaa 

11234a  1B1234 

2  2  3  4  a  1  2  a  2  4  1  3 

3  3  4  a  1  2  3  a  3  1  4  2 

4  4  3  1  2  3  434321 

•  ADDITION  TABLE 

-  NOTE  THAT  ROW  AND  COLUMN  ENTRIES  SPAN  THE  RESIDUE 
SET  IN  A  CYCLIC  FASHION 

-  NOTE  THE  CR03S-DI A8QNAL  SYMMETRY  PRESENT  IN  THE  TABLE 

-  NOTE  THE  CYCLIC  PERMUTATION  OF  THOSE  CROSSED I A80NALS 

*  MULTIPLICATION  TABLE 

-  NOTE  ZERO  ROW  AND  COLUMN  RESULTANT  FROM  MULTIPLICATION 
BY  ZERO 

-  EXCLUSION  OF  ZERO  ROW  AND  COLUMN  RESULTS  IN  ROWS  AND 
COLUMNS  THAT  SPAN  A  REDUCED  RESIDUE  SET 
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PROPERTIES  OF  RES I DUE  ARITHMETIC 

ADDITIONAL.  PROPERTIES  OF  THE  RNS  CAN  BE 
EXPRESSED  IN  TERMS  OF  GROUP  THEORY 

#  THE  SET  OF  ZNTESERS  0,1,2 . m-1  <rm* Iduas  of  MODULUS  m) 

FORM  A  GROUP  UNDER  ADDITION  MODULO  m 

*  THE  SET  OF  INTEQERS  1,2,3,. . . ,«-l  (raductd  rasidua  sat) 
FORM  A  GROUP  UNDER  MULTIPLICATION  MODULO  m 

*  BOTH  OF  THESE  GROUPS  ARE  CYCLIC  UNDER  THE  GIVEN  OPERATION 

-  A  CYCLIC  GROUP  MUST  HAVE  AT  LEAST  ONE  GENERATOR 

•  THE  GENERATOR  IS  AN  ELEMENT  OF  THE  GROUP 

-  SUCCESSIVE  GROUP  OPERATIONS  UPON  THE  GENERATOR 
DETERMINES  THE  CYCLIC  SEQUENCE 

->  “I"  IS  ALWAYS  A  GENERATOR  FOR  ADDITION  MODULO  m 

->  GENERATORS  FOR  MULTIPLICATION  MODULO  m  DEPEND  UPON  m 
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EXAMPLE a  MODULO  5  GROUP  OPERATIONS 

GROUP  OPERATION  OF  ADDITIONS 

-  THE  RESIDUES  (fl.1,2,3,4)  FORM  A  CYCLIC  GROUP  UNDER 
ADDITION  MODULO  S 

-  F*-l  IS  ALWAYS  A  GROUP  GENERATOR 

-  THE  OPERATION  TABLE  IS  IDENTICAL  TO  THE  LOOK-UP  TABLE 

GROUP  OPERATION  OF  MULTIPLICATION! 

-  THE  INTEGERS  <1,2,3, 4)  FORM  A  CYCLIC  GROUP  UNDER 
MULTIPLICATION  MODULO  S 

-  F  *• «3  IS  A  GROUP  GENERATOR 

3--1  3  *  -3  3a-4  3=»-2  <3~-3»-l> 

-  CYCLIC  ORDERING  IS  THEN  <1,3, 4, 2) 

-  OPERATION  TABLE  IS  REALIZED  AS  REDUCED  <n»-l  x  m-1) 
TRUTH  TABLE  WITH  ROWS  AND  COLUMNS  RESEQUENCED 


*  NOTICE  THAT  THE  LUT  NOW  EXHIBITS  ALL  THE  CYCLIC  PROPERTIES 
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logarithmic  transformation 

FOR  MULTIPLICATION 

*  SINCE  THE  MULTIPLICATION  TABLE  CAN  BE  TRANSFORMED  INTO  A 
GROUP  OPERATION  TABLE  ,  THE  CHOICE  OF  GROUP  OPERATION 
BECOMES  ARBITRARY 

*  THUS,  THE  RESEQUENCED  MULTIPLICATION  LOOK-UP  TABLE  CAN  BE 
REPLACED  WITH  AN  ADDITION  TABLE 

->  NEED  ONLY  TO  KEEP  TRACK  OF  THE  PERMUTATION  MAPPING 

LOGARITHMIC  TRANSFORMATION 

-  THE  GENERATOR  PROVIDES  THE  KEY  TO  THE  TRANSFORMATION 

-  TAKING  THE  GENERATOR-BASED  LOG  OF  THE  INTEGER  POWERS 

WHICH  GENERATE  THE  GROUP  PROVIDES  A  TRANSFORMATION 
WHICH  MAPS  THE  1 , 2  ,  .  .  .  ,  m  - 1  CYCLIC  GROUP  UNDER 
MULTIPLICATION  MODULO  m  ONTO  THE  CYCLIC 

GROUP  UNDER  ADDITION  MODULO  m-1 

*  MULTIPLICATION  MODULO  m  -  - >  ADDITION  MODULO  m-1 

SUMMARY!  MODULAR  ADDITION  IS  A  CYCLIC  GROUP  OPERATION  AND 
MODULAR  MULTIPLICATION  CAN  BE  TRANSFORMED  INTO  A  CYCLIC  GROUP 
OPERATION 
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RNS  LOOK— UP  TABL 


PROCESS  X  IMS 


TWO  RECENT  F*F>F*FROF*GMES 

*  MOST  LUT  ARCHITECTURES  PROPOSED  REQUIRE  ACTIVE  ELEMENTS  FOR 
EACH  TABLE  ENTRY 

->  SPATIAL  COMPLEXITY  -  m =» 

*  THIS  IMPOSES  LIMITATIONS  ON  THE  RANGE  OF  ACCEPTABLE  MODULI 

*  RECENT  EFFORTS  HAVE  CENTERED  ON  EXPLOITING  THE  CYCLIC  GROUP 
PROPERTIES  OF  RESIDUE  ARITHMETIC  TO  REDUCE  NUMBER  OF  LUT 
ENTRIES 

FACTORED  LOOK-UP  TABLES 

WESTINSHOUSEi  GOUTZOULIS,  MALARKEY ,  DAVIES,  BRADLEY, 
and  BEAUDET 

CROSS-DIAGONAL-SYMMETRIC  LOOK-UP  TABLES 

B0EIN8  AEROSPACE*  C.  CAPPS,  R.  FALK  AND  T.  HOUK 
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factored  look-up  tables 


#  THE  RNS  REPRESENTATION  CAN  BE  VIEWED  AS  " FACTOR  I N8 "  AT 
THE  FIRST  LEVEL,  PROPERTIES  OF  MULTIPLICATION  MODULO  m 
ALLOW  FACTORING  AT  A  SECOND  LEVEL 

ALGORITHM*  "DIRECT"  METHOD 

-  FOR  PRIME  m,  m-1  IS  EVEN  AND  CAN  BE  FACTORED  INTO 
PAIRWISE  RELATIVE  PRIMES 

m  —  1  “  k  a  ■*  k  ;»  •*  .  .  .  *  k 

-  GENERATORS  SPECIFY  THE  ELEMENTS  OF  p  SUBGROUPS 

-  iTH  SUBGROUP  HAS  DIMENSION  k* 

-  SUBGROUPS  ARE  CYCLIC  (THEOREM) 

-  EACH  RESIDUE  DIGIT,  FOR  A  PARTICULAR  MODULUS  m,,  IS  NOW 
REPRESENTED  AS  A  p -TUPLE  OF  SUB-RESIDUES 

P  -4  —  ^  (  P  J  1  |  R  J  SM  ,  •  •  ■  ,  R  j  p  ) 

-  PROCESSING  IS  PERFORMED  IN  p  SEPARATE  LUTs  OF  DIMENSION 

k  *  X  k  * 

»  ALSO  PERFORM  FACTORING  WITH  "LOGARITHMIC"  METHOD 
«  FACTORED  ADDITION  IS  MORE  COMPLEX  FOR  EITHER  METHOD 


X AMPLE i  PACT OPED  LOT  PROG 


I  NO 


DIRECT  APPROACH 


MODULUS  -  7 

RESIDUES!  3, 1 ,2, 3, 4,3,6 
REDUCED  SETs  1,2, 3, 4, 3, 6 
FACTORS  6-2*3 

GENERATORSs  2*»1  21  «2  2a*-4  2*»2—*l 

6*»-l  6  1 -6  6=*-6—»l 

SUBGROUPS s  2  ELEMENT  ->  (1,6) 

3  ELEMENT  ->  (1,2,4) 

ENCODING*  1-<1,1>  2-<l,4)  3-<6,2>  4«<1,2>  3-(6,4)  6-<6,l) 


LUT  s  s 


x 


1 


1 


1 


6 


a 


a 


a 


x 


EXs  3*6  MODULO  7  ->  <6,2>*<6,1> 


< 1,2)  ->  4 


NOTES  COULD  HAVE  PERFORMED  THE  MULTIPLICATION  IN  ADDITION 
TABLES  USING  THE  LOGARITHMIC  TRANSFORMATION 


-  34  - 


-D  I  AGONAL-SYMMETR  I  C  L-LJT 


*  EXPLOIT  THE  CROSS-DIAGONAL  SYMMETRY  PRESENT  IN  GROUP  TABLES 

*  A  TABLE  OPERATION  NEED  ONLY  FIND  THE  CORRECT  CROSS-DIAGONAL 


•  NOTE  CORRESPONDENCE  BETWEEN  PCR  INPUT  SOURCE  SPACING  AND 
CROSS-DIAGONAL  OF  TABLE 

ALGORITHM! 

*  ONLY  PERFORM  OPERATIONS  ON  CYCLIC  TABLES 

-  ADDITION  MODULO  m  IS  CYCLIC 

-  MULTIPLICATION  MODULO  a  CAN  BE  TRANSFORMED  INTO 
ADOITQN  MODULO  a- 1 

*  ARRANGE  INPUTS  IN  LINEAR  ARRAY 

«  CALCULATE  EFFECTIVE  DISTANCE  BATWEEN  INPUTS 

->  SPATIAL  COMPLEXITY  IS  REDUCED  TO  LINEAR  DIMENSION 

-  35  - 


< 


Ffewvt  Qomafi*mfOI*mm&Pm**Uodu*SAd*r 

Ptc^:  vWt ,  Optical  Aen MM6*fc/u*i*  U*xr  c»J 

^Kitouc  Sublime*!/'  t®  it  Araiw  Optics, 
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r 


OPTICAL  RMS  L_l_nr  PROCESS  I  MO 


PERFORMANCE  ANALYSIS 


APPROACH 


*  TO  ANALYZE  CURRENT  LOOK-UP  TABLE  ARCHITECTURES  (WITH 
RESPECT  TO  TRADITIONAL  LUT  ARCHITECTURES)  AS  ARITHMETIC 
PROCESSING  UNITS. 

-  BUILD  PROCESSING  UNITS  FROM  FUNDAMENTAL  COMPUTATIONAL 
COMPONENTS 

-  COMPATIBLE  INPUT  AND  OUTPUT  DATA  FORMATS 

->  SPECIFY  PERFORMANCE  CRITERIA 

->  SPECIFY  AND  ANALYZE  COMPUTATIONAL  COMPONENTS 

«>  SPECIFY  AND  ANALYZE  FUNDAMENTAL  COMPUTATIONAL  UNIT 

«>  EVALUATE  DIFFERENT  ALGORI THMS/ARCHITECTURES 


PERFORMANCE  criteria 


#  TEMPORAL  COMPLEXITY  <CT): 

NUMBER  OF  SEQUENTIAL  STAGES 


*  SPATIAL  COMPLEXITY  (Cm)  i 

NUMBER  OF  DECISION  MAKING  ELEMENTS 


*  INTERCONNECT  COMPLEXITY  <CX) 

-  AVERAGE  FAN-IN  <Crz>  AND  FAN-OUT  (CFQ)  PER  CHANNEL 

-  "SHIFT  VARIANT"  OR  "SHIFT  INVARIANT" 


ELEMENT  COMPLEXITY  (Cm) 

AVERAGE  NUMBER  OF  RESOLVABLE  LEVELS 
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PROCESS I NG  COMPONENTS* 
DEFINITION  AND  EVALUATION 


TRANSFORMATION  STAGE 

FOR  LOGARTIHMIC  TRANSFORMATIONS  OR  ANY  FIXED  PERMUTATION 


V2  M0DUL05* 


INPUT  ENCODER 

FOR  ENCODING  INPUTS  INTO  FACTORED  REPRESENTATION 

C-r-0 

C.«0 


FACTOR  ENCODING 


I  IMG  COMPONENTS  a 


DEFINITION  AND  EVALUATION 


OUTPUT  DECODER 

DECODING  FACTORED  OUTPUTS  INTO  POSITION  CODE 
C  r  "  1 


C  <->  ■  <  m  —  1  )  number  of  output  lines 


ZERO  DETECTOR 

REMOVING  ZERO  INPUT  FOR  MULTIPLICATION 


C  -r  •  1 

c.«l 

C  ^  i  ■  2 

c  «  - 1  :  0 


0 


ZERO  DETECTOR 


PROCESS ING  COMPONENTS a 
BOEING  ADD I T I ON  LOOK-UP  TABLE 


DESCRIPTION! 

2  m-POSITION  INPUTS 
m-POSITION  OUTPUT 
2  m -  1  LUT  ENTRIES 


INPUT  B 


LUT 


OUTPUT 


EVALUATION! 

C  -r  •  1 
C • -2m - 1 

Cpr0«>  architactur*  dapandant 
Cf i ■>  arehitactura  dapandant 
C  «  - >  arehitactura  dapandant 
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PROC 


Z  IMG  COMPONENTS  s 


BOEING  MULTIPLICATION  LOOK-UP  TABLE 


DESCRIPTION! 

INPUT  ZERO  DETECTION 
INPUT  TRANSFORMATION 
2  <m-l>  POSITION  INPUTS 

<  m - 1 )  POSITION  OUTPUT 


C  m  m2m  -2 

C  x  "  >  Cx  tr«n*forM«tlon 
Craa>  architictura  dapandant 


3 

a. 


8 


C x  ■  >  arehltactura  dapandant 
C  *  * >  archltactura  dapandant 


SSI NO  COMPONENTS a 


WEST I NOHOUSE  FACTORED  MULTIPLICATION 

LOOK-UP  TABLE 

DESCRIPTION* 

INPUT  ZERO  DETECTION 
2k*  POSITION  INPUTS,  i-1  TO  p 
k  *  POSITION  OUTPUT 
k  *  =*  LUT  ENTRIES 


EVALUATION* 


NPUTB 


0  kl  k2 


OUTPUT 


C  -r  «  1 

c— 

C^o-k*  SHIFT  INV 
Cr  *  -k*  SHIFT  VAR 
C«  -2*  I 
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PROCESSING  COMPONENTS a 

WEST  I  NGHOUSE  FACTORED  ADO  I  T  I  ON  I _ LJ "F 

DESCRIPTION* 

INPUT  ZERO  DETECTION  AND  PASS-THRU 
EXTRA  INPUT  2-BIT  ,,L"or,,UM  WORD 
2  k  ±  (1-1  to  p)  *4  POSITION  INPUTS 

k±  * 2  POSITION  OUTPUT 
k  *  *  ♦  4  LUT  ENTRIES 

"FEEDBACK  LUT"  RECODES  ADDITION  OUTPUTS 


ipurs 

0  kl  k2  UW. 


C 


k  A  SHIFT  VAR  <i  dacadar  «i  ancadar 


NOTES  ON  FACTORED  ADDITION  LUTs 

Westinghouse  has  proposed  two  main  approaches  to 
factored  addition:  "direct"  and  "logarithmic."  The 
approaches  differ  greatly  in  Lowh  algorithm  and 
architecture.  In  this  analysis,  we  focus  on  the  "direct" 
method. 

First,  recall  that  the  motivation  for  LUT  factoring  was 
based  on  multiplication  by  zero.  The  key  property  is  that  a 
zero  result  can  only  be  produced  if  either  input  (or  both 
inputs)  is  (are)  zero,  and  hence,  simple  zero-detection 
removes  zero  from  the  input  and  the  output.  However, 
removing  zero  from  the  input  for  modular  addition  is  a 
different  case.  Zero  can  result  from  additions  where 
neither  addend  is  zero  (i.e.  5+2  modulo  7=0).  Therefore, 
the  LUTs  for  addition  must  be  capable  of  identifying  m 
separate  output  channels,  but  the  factored  LUTs  can  only 
produce  m-1  outputs.  Additional  information  must  be  carried 
through  the  system  to  produce  the  extra  output. 

In  terms  of  processing  components,  there  are  three 
additional  units  required  to  perform  factored  addition  over 
factored  multiplication.  Their  description  and  related 
complexities  are  discussed  here. 

(1)  Zero-detection  with  pass-thru  capability:  note  that 
addition  by  zero  must  pass  the  other  input,  unperturbed,  to 
the  output.  This  corresponds  to  an  additional  spatial 
complexity  equal  to  the  total  number  of  non-zero  input 
channels.  This  is  the  term  linear  in  kj. 

(2)  Additional  "U/L"  2x2  LUT:  the  LUT  is  appended  to 
identify  each  input  as  being  upper  or  lower  half  of  the  the 
range.  The  table  has  three  output  channels:  U  for  both 
inputs  U,  L  for  both  inputs  L,  and  B  for  one  input  of  each. 
This  corresponds  to  a  constant  spatial  complexity  of  4. 

(3)  "Feedback  LUT:"  can  be  realized  as  a  decoder,  to 
detect  all  possible  3* (m-1)  output  states,  cascaded  with  an 
encoder,  to  transmit  the  correct  output  based  on  the 
"feedback"  rules  for  addition.  This  most  general  case 
corresponds  to  the  spatial  complexity  term  linear  in  m  (3m-3 
for  the  decoder)  with  element  dynamic  range  of  p+l:l 
required. 

Note  that  with  the  "direct"  approach,  the  data  format 
is  different  for  addition  than  for  multiplication,  requiring 
an  additional  re-encoding  stage  between  any  pairs  of 
successive  additions  and  multiplications.  This  result  is 
best  seen  when  the  MAU  is  constructed.  Westinghouse  has 
introduced  the  "logarithmic"  method  for  addition  (based  on 
the  logarithmic  transformation  we  discussed  earlier) ,  which 
provides  for  common  data  representation  in  both  LUTs.  This 
eliminates  the  need  for  a  re-encoding  stage.  However,  the 
addition  algorithm  introduces  three  sequential  operations. 


TABLE)  COMPONENT  COMR’L.EXITY 


COMPONENT  C-r 


C  r  :  x  ,Cf 


TRANSFORM 


ENCODER 


DECODER 


2.  DETECT 


m  =*  LUT 


B- ADD 


•  OP 


(ffi  —  l  )  /  k  4  fp 


p  ,  <m-l  >  /k  * 


mSV ,  mSI 


2m  - 1 


B-MULT 


2m -2 


W-MULT 


Jl<ta+1  k  *  SV  ,kt  91 


W-ADD 


J  •<  i  *  + 

2  k*.  +3m  + 1 


k  *  SV*,  k  *  SI* 


p+  1  j  1 


i 


DEFINING  THE  COMPUTATIONAL  UNIT 

THE  MULT  I  PL Y- ACCUMULATE  UNIT 

*  THE  MAU  PERFORMS  THE  "SUM  OF  PRODUCTS”  (SOP)  OPERATION 
WHICH  IS  FUNDAMENTAL  IN  NUMERICAL  COMPUTATION 

|  C  (  i  +  1  >  I  m  -  C  |A(i  )  )  |m  1  *  |  C  <  i  )  I  m 


NOW,  WE  WILL  SPECIFY I 

-  DIRECT  MAU 

-  BOEINQ  MAU 

-  WEST IN3H0USE  FACTORED  MAU 


WEST I NGHOUSE 


THE 


i 


PERFORMANCE  ANALYSIS  RESULTSi 

COMPLEXITY  OF  RNS  LUT  PROCESSORS 
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RESULTS 


TABLE  s  COMPLEX  ITV'  OF  PROCESSING 

l_IIM  I  TS 


UNIT 


TEMPORAL  COMPLEXITY 


ELEMENT  COMPLEXITY 


MULTIPLIER 

DIRECT 

BOEING 

WESTINGHOUSE 


1 

1 

1 


2  i  1 

ARCH. “DEPENDENT 

2  i  1 


ADDER 

DIRECT 

BOEING 

WESTINGHOUSE 


1 

1 

2 


2 1  1 

ARCH. -DEPENDENT 

2i 1 ,  p+1 : 1 


MAU 

DIRECT 

BOEING 

WESTINGHOUSE 


2 

2 

3 


2*  1 

ARCH. “DEPENDENT 

2tl,  pi  1,  p  +  lil 


THE  SPATIAL  COMPLEXITY  SHOWS  A  STRONG  DEPENDENCE  ON 
MODULI,  WHICH  IS  BEST  SEEN  IN  GRAPHICAL  FORMAT 


(s>u0UJ9|»  ;o  #)  6o“i 


Mi 


8JIQ 


MAU  Spatial  Complexity 


Dire 


NOTES  ON  SPATIAL  COMPLEXITY  RESULTS 

-  The  modulus  axis  is  not  linear. 

-  The  moduli  were  chosen  based  upon  those  listed  for 
factoring  by  Goutzoulis  (LA-SPIE,  1988) . 

-  The  Complexity  axis  is  a  Log10  scale. 

MULTIPLIER  SPATIAL  COMPLEXITY 

-  The  upper  bound  is  quadratic  in  m  (m2) . 

-  The  Boeing  plot  is  linear  in  m  (2m-2) . 

-  The  Westinghouse  plot  is  heavily-dependent  upon  the 
factorization. 

-  At  m=157,  both  approaches  exhibit  approx,  two  orders 
of  magnitude  reduction  in  complexity  over  the  Direct. 

-  As  m  increases,  the  oscillation  of  the  West,  plot 
dampens . 

-  Relative  magnitudes: 

m*23  ->  factor  of  3  between  Boeing  and  West. 

m=»61  ->  factor  of  2.5  between  West,  and  Boeing 
m=*157  ->  factor  of  1.5  between  West,  and  Boeing 

ADDER  SPATIAL  COMPLEXITY 

-  Upper  bound  and  lower  bound  are  quadratic  and  linear 
complexities,  respectively. 

-  Linear  term  in  West,  complexity  dominates. 

-  Boeing  complexity  remains  the  same. 

-  At  m=157,  both  approaches  demonstrate  reductions  in 
complexity  of  at  least  35  over  the  Direct. 

-  Relative  magnitudes: 

m**23  ->  factor  of  5  between  Boeing  and  West. 

m=61  ->  factor  of  2  between  Boeing  and  West. 

m=157  ->  factor  of  2  between  Boeing  and  West. 

MAU  SPATIAL  COMPLEXITY 

-  Upper  bound  and  lower  bound  are  quadratic  (2m2)  and 
linear  (4m)  complexities,  respectively. 

-  Results  almost  identical  to  adder  results 

-  Linear  terms  in  West  adder  dominate. 

-  At  m-157,  both  approaches  demonstrate  reductions  in 
complexity  of  at  least  35  over  the  Direct. 

-  Relative  magnitudes: 

m-23  ->  factor  of  4  between  Boeing  and  West. 

m*61  ->  factor  of  2  between  Boeing  and  West. 

m»157  ->  factor  of  2  between  Boeing  and  West. 
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CONCLUS I ONS 


*  THE  SECOND  LEVEL  FACTORING  AND  THE  USE  OF  CYCLIC 
PROPERTIES  BOTH  LEAD  TO  LINEAR  SPATIAL  COMPLEXITY 
LOOK-UP  TABLE  PROCESSORS. 

»  THE  TIME  COMPLEXITY  IS  INDEPENDENT  OF  MODULUS  SIZE 

*  THE  ELEMENT  COMPLEXITY  (DYNAMIC  RANGE)  SHOWS  A  MODERATE 
DEPENDENCE  ON  MODULI 

*  GLOBAL  f  SPACE-VARIANT  INTERCONNECTS  WITH  MODERATE 
(10-100)  FAN-IN  AND  FAN-OUT  REQUIRED 
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HARDWARE  CQN3IDERATIQN8i 


A  SUMMARY 
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X  ON8 


THE  RNS  OPTICAL  CQMPUTINB  UNITS  CONSISTS  OFi 

-  INTERCONNECTS  (BETWEEN  UNITS  AND  WITHIN  UNITS) 

-  ACTIVE  SWITCHIN8  ELEMENTS  (LUTs  AND  DECODERS) 

THERE  ARE  A  NUMBER  OF  CHOICES  FOR  EACH  OF  THE 
CONSTITUENTS.  THE  HARDWARE  SELECTION  FOR  EACH 
CONSTITUENT  WILL  BE  QUIDED  BY  THE  ALBQRITHH  AND 
ARCHITECTURE 

THE  SYSTEM  PERFORMANCE  (COMPUTATIONAL  THRQUSHPUT,  POWER 
CONSUMPTION,  8PATIAL  COMPLEXITY,  INTERFACE 
REQUIREMENTS,  MECHANICAL  STABILITY)  IS  DETERMINED  BY 
COMB INI NS  THE  HARDWARE  CHARACTERISTICS  WITH  THE 
AL80R1 THH  /  ARCHITECTURE  CHARACTERISTICS 
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I  NTERGONN 


BETWEEN  CQMPUT1N6  UNIT  <1-1> 

ENCODERS  <t-M,  IRREGULAR,  1-D) 

DECODERS  <M-M,  IRREGULAR,  1-D) 

LOOK  UP  TABLES  <1-M,  REGULAR,  PI ^NAR  FOR  INPUTS|  M-l 
IRREGULAR,  3-D  FOR  OUTPUT8|  M-l,  REGULAR,  PLANAR  FOR 
OUTPUTS  IN  SYMMETRIC) 


CHNOLOG X  E8 

FIBER  OPTIC  INTERCONNECTS  ARE  FLEXIBLE,  RUBBED,  3-D, 

EFFICIENT  ANO  EASY  TO  DEMONSTRATE 
DON'T  SCALE  WELL 
DIFFICULT  TO  AUTOMATE 

ZNTEBRATED  OPTICAL  INTERCONNECTS  ARE  FLEXIBLE,  RUBBED 

EFFICIENT,  WILL  SCALE  WELL  AND  EASILY  AUTOMATED 
INHERENTLY  PLANAR  (NOT  SUITED  TO  LUT») 

FREE-SPACE  OPTICAL  INTERCONNECTS  ARE  FLEXIBLE,  SCALE 

WELL  AND  EASILY  AUTOMATED 

TIBHT  ALIGNMENT  TOLERENCES 
INEFFICIENT 

FOURIER  OPTICAL  INTERCONNECTS  HAVE  SPECIAL  SYMMETRIES 
COHERENT  ILLUMINATION  REQUIRED 


LEMENT8 


U8E8 

»  DECODERS  (P-INPUT  AND  SATES) 

•  LOOK  UP  TABLEB  (2-INPUT  AND  BATES,  *-INPUT  WIRED  OR 
SATES-  ENTAILS  8PATIAL  COMPLEXITY) 

TECHNOLOQ X E8 

•  ALL-OPTICAL  NONLINEAR  DEVICES 

HI8H  SPEED,  COMPATIBLE  INPUT  /  OUTPUT,  2-D  PARALLEL 

OUTPUT,  LOW  PAN- IN  /  OUT,  CONTRAST 

HIBH  POWER  CONSUMPTION,  IMMATURE  TECHNOLOSY 

•  HYBRID  TECHNOLOB 1 EB 

LASER  DIODE  ARRAYS 

HIBH  SPEED,  LARBE  PONER  CONSUMPTION 
BUIDED  HAVE  OPTICAL  8NITCHEB 

HIBH  SPEED ,  LARBE  PONER  CONSUMPTION,  1-D  ARRAYS 
3PECIAL  PURPOSE  DEVICES  U-D  AC0U8T00PT1C  POINT 
MODULATOR  ARRAYS,  MA8NET00PT I C  LIBHT  MODULATOR) 

IMMATURE  TECHNOLOS I E8 ,  PONER-BPEED  LIMITS  NOT 
HELL  ESTABLISHED 


I  NTERACT  I  (DIM  BETWEEN  ARCH  I  TECTURES 

AND  HARDWARE 

«  BOEING  APPROACH  OF  UT1LIZIN8  LINEAR  COMPLEXITY  19  BASED 
ON  FOURIER  OPTICS  FOR  SH I FT- I NVAR I ANT  INTERCONNECTS. 

»  ARCHITECTURE  AMENABLE  TO  INTE8RATED  OPTICAL 
IMPLEMENTATION  WITH  HISH  SPEED  MODULATORS 

•  THE  LAR8E  AREA  DETECTOR  REQUIRED  IN  THE  OUTPUT  MAY  POSE 
THE  PRIMARY  LIMIT  ON  THE  SPEED  OF  THE  SYSTEM 

*  IF  THE  DETECTOR  AREA  CAN  BE  REDUCED,  THE  OVERALL  BY8TEM 
EFFICIENCY  COULD  CHANSE 
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SUMMARY 


•  RESIDUE  NUMBER  SYSTEM  CAN  BE  EFFECTIVELY  USED 
APPLICATIONS  WITH  LAR8E  NUMBER  OF  MULT I PLY-ADD/SUBTRACT 
AND  VERY  FEW  DIVISIONS,  COMPARISONS,  SIQN  DETECTION 

•  POSITION  CODED  RESIDUE  REPRESENTATION  LEAD8  TO 
EFFECTIVE  LOOK  UP  TABLE  COMPUTIN0 

•  8W  ITCH  I  NS  REQUIREMENTS  OF  LUTs  ARE  MODE8T  IN  TERM8 
OF  FAN-IN  /  OUT  AND  CONTRAST 

•  SPATIAL  COMPLEXITY  AND  SWXTCHXNB  ELEMENT  COMPLEXITY 
DEPEND  VERY  BTRONSLY  ON  THE  ALSORXTHM  AND  THE  MODULUS 

•  IMPACT  OF  DIFFERENT  DEVICE  TECHNOLOSXES  ON 
ALSOTECTURE  PERFORMANCE  MERITS  FURTHER  STUDY 

•  DELXNXATXON  OF  DOMAINS  OF  APPLICABILITY  OF  RESIDUE 
NUMBER  SYSTEM  MERITS  FURTHER  STUDY 
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